Asap: Automatic Speculative Acyclic Parallelization for Clusters
نویسنده
چکیده
While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for clusters due to high internode communication cost and lack of globally shared memory. Speculative Decoupled Software Pipelining (Spec-DSWP) is a promising automatic parallelization technique for clusters that speculatively partitions a loop into multiple threads that communicate in a pipelined manner. Speculation can complement conservative static analysis, making automatic parallelization more robust and applicable. Pipelining allows Spec-DSWP to speculate only rarely occurring dependences while respecting the other dependences through communication among threads. Acyclic communication patterns in pipelining make the parallelized programs tolerant of high communication latency of clusters. However, since Spec-DSWP partitions a loop iteration (a transaction) into multiple sub-transactions across multiple threads according to the pipeline stages, a special runtime system is required that supports multi-threaded transactions (MTXs). This dissertation proposes the Automatic Speculative Acyclic Parallelization (ASAP) system that enables Spec-DSWP for clusters without any hardware modification. The ASAP system supports various speculation techniques that require different validation and communication costs, and automatically parallelizes sequential loops using the SpecDSWP transformation with the optimal application of the speculation techniques. The ASAP system efficiently supports MTXs to correctly execute the speculatively transformed programs on clusters. With synergistic combination of speculation, acyclic communication, and runtime system support, this approach achieves or demonstrates a path to achieve scalable performance speedup up to 109× for a wide range of applications on clusters without any hardware modification.
منابع مشابه
Run-time Parallelization Techniques for Sparse Applications
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we have introduced a novel framework for their identification: speculative parallelization. While we have previously shown that this method is inherently scalable its p...
متن کاملA Parallelizing Compilation Method for the Map-oriented Machine
The paper introduces a novel parallelizing compilation method for the MoM. The MoM (Map-oriented Machine) is an Xputer architecture featuring multiple data sequencers and “soft ALUs”. The compiler accepts C-source, which are restructured and partitioned into structural and sequential code providing parallelism at expression and statement level.
متن کاملThe Potential of Synergistic Static, Dynamic and Speculative Loop Nest Optimizations for Automatic Parallelization
Research in automatic parallelization of loop-centric programs started with static analysis, then broadened its arsenal to include dynamic inspection-execution and speculative execution, the best results involving hybrid static-dynamic schemes. Beyond the detection of parallelism in a sequential program, scalable parallelization on many-core processors involves hard and interesting parallelism ...
متن کاملSupport for Thread-Level Speculation into OpenMP
– In-depth knowledge of the problem. – Understanding of the underlying architecture. – Knowledge on the parallel programming model. • OpenMP allows to parallelize code “avoiding” these requirements. • Compilers’ automatic parallelization only proceed when there is no risk. • Thread-Level Speculation (TLS) can extract parallelism when a compile-time dependence analysis can not guarantee that the...
متن کاملQinling: A Parametric Model in Speculative Multithreading
Speculative multithreading (SpMT) is a thread-level automatic parallelization technique that can accelerate sequential programs, especially for irregular applications that are hard to be parallelized by conventional approaches. Thread partition plays a critical role in SpMT. Conventional machine learning-based thread partition approaches applied machine learning to offline guide partition, but ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013